% Copyright 2018 Google LLC
%
% Use of this source code is governed by an MIT-style
% license that can be found in the LICENSE file or at
% https://opensource.org/licenses/MIT.

%!TeX spellcheck = en-US

\documentclass[eprint.tex]{subfiles}
\begin{document}
\section{Performance}\label{performance}

Efficient implementations of NH, Poly1305 and ChaCha are available for many
platforms, as these algorithms are well-suited for implementation with either
general-purpose scalar instructions or with general-purpose vector instructions
such as NEON or AVX2.
In \autoref{performancetable} we
show performance on an ARM \mbox{Cortex-A7}
processor in the Snapdragon 400 chipset running at \mbox{1.19 GHz}.  This
processor supports the NEON vector instruction set, but not the ARM cryptographic
extensions; it is used in many smartphones and smartwatches, especially low-end
devices, and is representative of the kind of platform we mean to target.
Where the figures are within 2\%, a single row is shown for both encryption and
decryption.

\begin{table}
    \caption{Performance on ARM Cortex-A7}
    \label{performancetable}
    \centering
    \begin{tabular}{lrr}
        \toprule
        Algorithm &
            \makecell{Cycles per byte \\ (4096-byte sectors)} &
            \makecell{Cycles per byte \\ (512-byte sectors)} \\
    \midrule
    \input{work/performance.tex}
    \bottomrule
    \end{tabular}
\end{table}

We have prioritized performance on 4096-byte messages, but we also tested 512-byte messages.
512-byte disk sectors were the standard until the introduction of Advanced Format in 2010;
modern large hard drives and flash drives now use 4096-byte sectors.
On Linux, 4096 bytes is the standard page size, the standard allocation unit size for filesystems,
and the granularity of \textit{fscrypt} file-based encryption, while
\mbox{\textit{dm-crypt}} full-disk encryption has recently been updated
to support this size.

For comparison we evaluate against various block ciphers in XTS mode~\cite{xts}:
AES \cite{AES}, Speck~\cite{speck1,speck2,speck3}, NOEKEON~\cite{noekeon},
and XTEA~\cite{xtea}. We also include the performance of ChaCha, NH,
and Poly1305 by themselves for reference.

We used the fastest constant-time implementation of each algorithm we were able
to find or write for the platform; see \autoref{implementation}.  As an
exception, given the high difficulty of writing truly constant-time AES
software~\cite{aes-cache-timing}, for single-block AES we tolerate an
implementation that merely prefetches the lookup tables as a hardening measure.
In every case the performance-critical parts were written in assembly language,
usually using NEON instructions.  Our tests complete processing of each message
before starting the next, so latency of a single message in cycles is the
product of message size and cpb.

\begin{table}
    \caption{Implementations}
    \label{implementation}
    \centering
    \begin{tabular}{llp{7cm}}
        \toprule
        Algorithm & Source & Notes \\
    \midrule
    ChaCha & Linux v4.20-rc1 & \texttt{chacha20-neon-core.S}, modified to support
        ChaCha8 and ChaCha12 \\
    Poly1305 & OpenSSL 1.1.0h & \texttt{poly1305-armv4.S}, modified to
        precompute key powers just once per key \\
    \mbox{AES} & Linux v4.17 & \texttt{aes-cipher-core.S}, modified to prefetch
        lookup tables \\
    \mbox{AES-XTS} & Linux v4.17 & \texttt{aes-neonbs-core.S} (bit-sliced) \\
    \mbox{Speck128/256-XTS} & Linux v4.17 & \texttt{speck-neon-core.S} \\
    \mbox{NOEKEON-XTS} & ours & \\
    \mbox{XTEA-XTS} & ours & \\
    \bottomrule
    \end{tabular}
\end{table}

Adiantum and HPolyC are the only algorithms in \autoref{performancetable} that are tweakable
super-pseudorandom permutations over the entire sector.  We expect any AES-based
construction to that end to be significantly slower than \mbox{AES-XTS}.

We conclude that for 4096-byte sectors, Adiantum
(aka \mbox{Adiantum-XChaCha12-AES}) can perform
significantly better than an aggressively designed block cipher (\mbox{Speck128/256}) in XTS mode.

\subbib
\end{document}
